AI's New Escape Trick: Past Tense Prompts Instantly Crack GPT-4o and Six Other Models
Large Language Models (LLMs) have shown remarkable performance in processing natural language through multiple iterations, but they also come with certain risks, such as generating toxic content, spreading misinformation, or supporting harmful activities.To prevent these scenarios, researchers train LLMs to reject harmful query requests. This training typically involves supervised fine-tuning, reinforcement learning from human feedback, or adversarial training.However, a recent study found that